An Effective Decision Procedure for Linear 
Arithmetic with Integer and Real Variables * 

BERNARD BOIGELOT, SEBASTIEN JODOGNE and PIERRE WOLPER 
Universite de Liege 
Institut Montefiore, B28 
4000 Liege, Belgium 



This paper considers finite-automata based algorithms for handling linear arithmetic with both real 
and integer variables. Previous work has shown that this theory can be dealt with by using finite 
automata on infinite words, but this involves some difficult and delicate to implement algorithms. 
The contribution of this paper is to show, using topological arguments, that only a restricted class 
of automata on infinite words are necessary for handling real and integer linear arithmetic. This 
allows the use of substantially simpler algorithms, which have been successfully implemented. 

Categories and Subject Descriptors: D.2.4 [Software Engineering]: Software/Program Verifica- 
tion — Formal methods; F.f.l [Computation by abstract devices]: Models of computation — 
Automata; F.4.1 [Mathematical Logic and formal languages]: Mathematical Logic — Com- 
putational logic; F.4.3 [Mathematical Logic and formal languages]: Formal languages — 
Classes defined by grammars or automata. 

General Terms: Algorithms, Theory. 

Additional Key Words and Phrases: Decision procedure, Finite-state representations, Integer and 
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1. INTRODUCTION 

Among the techniques used to develop algorithms for deciding or checking logical 
formulas, finite automata have played an important role in a variety of cases. Clas- 
sical examples are the use of infinite- word finite automata by Biichi [Biichi 1962] 
for obtaining decision procedures for the first and second-order monadic theories 
of one successor, as well as the use of tree automata by Rabin [Rabin 1969] for 
deciding the second-order monadic theory of n successors. More recent examples 
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are the use of automata for obtaining decision and model-checking procedures for 
temporal and modal logics [Vardi and Wolper 1986a; 1986b; 1994; Kupferman et al. 
2000]. In this last setting, automata-based procedures have the advantage of mov- 
ing the combinatorial aspects of the procedures to the context of automata, which 
are simple graph-like structures well adapted to algorithmic developments. This 
separation of concerns between the logical and the algorithmic has been quite fruit- 
ful for instance in the implementation of model checkers for linear-time temporal 
logic [Courcoubetis et al. 1990; Holzmann 1997]. 

As already noticed by Biichi [Biichi 1962; 1960], automata-based approaches are 
not limited to sequential and modal logics, but can also be used for Presburger 
arithmetic. To achieve this, one adopts the usual encoding of integers in a base 
r > 2, thus representing an integer as a word over the alphabet {0, . . . , r — 1}. By 
extension, n-component integer vectors are represented by words over the alphabet 
{0, . . . , r — 1}™ and a finite automaton operating over this alphabet represents a set 
of integer vectors. Given that addition and order are easily represented by finite 
automata and that these automata are closed under Boolean operations as well 
as projection, one easily obtains a decision procedure for Presburger arithmetic. 
This idea was first explored at the theoretical level, yielding for instance the very 
nice result that base-independent finite-automaton representable sets are exactly 
the Presburger sets [Cobham 1969; Semenov 1977; Bruyere et al. 1994]. Later, it 
has been proposed as a practical means of deciding and manipulating Presburger 
formulas [Boudet and Comon 1996; Boigelot 1998; Shiple et al. 1998; Wolper and 
Boigelot 2000]. The intuition behind this applied use of automata for Presburger 
arithmetic is that finite automata play with respect to Presburger arithmetic a role 
similar to the one of Binary Decision Diagrams (BDD) with respect to Boolean 
logic. These ideas have been implemented in the LASH tool [LASH ], which has 
been used successfully in the context of verifying systems with unbounded integer 
variables. 

It almost immediately comes to mind that if a finite word over the alphabet 
{0, . . . , r — 1} can represent an integer, an infinite word over the same alphabet ex- 
tended with a fractional part separator (the usual dot) can represent a real number. 
Finite automata on infinite words can thus represent sets of real vectors, and serve 
as a means of obtaining a decision procedure for real additive arithmetic. Further- 
more, since numbers with fractional parts equal to zero can easily be recognized by 
automata, the same technique can be used to obtain a decision procedure for a the- 
ory combining the integers and the reals. This was not previously handled by any 
tool, but can be of practical use, for instance in the verification of timed systems 
using integer variables [Boigelot et al. 1997]. However, turning this into an effective 
implemented system is not as easy as it might first seem. Indeed, projecting and 
complementing finite automata on infinite words is significantly more difficult than 
for automata on finite words. Projection yields nondeterministic automata and 
complementing or detcrminizing infinite-word automata is a notoriously difficult 
problem. A number of algorithms have been proposed for this [Biichi 1962; Sistla 
et al. 1987; Safra 1988; Kupferman and Vardi 1997], but even though their theo- 
retical complexity remains simply exponential as in the finite-word case, it moves 
up from 2°W to 2°( nl ° s ™) and none of the proposed algorithms are as easy to 
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implement and fine-tune as the simple Rabin-Scott subset construction used in the 
finite-word case. 

However, it is intuitively surprising that handling reals is so much more difficult 
than handling integers, especially in light of the fact that the usual polyhedra- 
based approach to handling arithmetic is both of lower complexity and easier to 
implement for the reals than for the integers [Ferrante and Rackoff 1979]. One 
would expect that handling reals with automata should be no more difficult than 
handling integers 1 . The conclusion that comes out of these observations is that 
infinite-word automata constructed from linear arithmetic formulas must have a 
special structure that makes them easier to manipulate than general automata on 
infinite words. That this special structure exists and that it can exploited to obtain 
simpler algorithms is precisely the subject of this paper. 

As a starting point, let us look at the topological characterization of the sets 
definable by linear arithmetic formulas. Let us first consider a formula involving 
solely real variables. If the formula is quantifier free, it is a Boolean combination 
of linear constraints and thus defines a set which is a finite Boolean combination 
of open and closed sets. Now, since real linear arithmetic admits quantifier elim- 
ination, the same property also holds for quantified formulas. Then, looking at 
classes of automata on infinite words, one notices that the most restricted one that 
can accept Boolean combinations of open and closed sets is the class of determin- 
istic weak automata [Staiger and Wagner 1974; Staigcr 1983]. These accept all 
w-regular sets in the Borel class F a n Gs and hence also finite Boolean combina- 
tions of open and closed sets. So, with some care about moving from the topology 
on vectors to the topology on their encoding as words, one can conclude that the 
sets representable by arithmetic formulas involving only real variables can always 
be accepted by deterministic weak automata on infinite words. If integers are also 
involved in the formula, a similar argument can be used, invoking a recently pub- 
lished quantifier elimination result for the combined theory [Weispfenning 1999]. 
However, initially unaware of this result, we developed a different argument to 
prove that sets definable by quantified linear arithmetic formulas involving both 
real and integer variables are within F a n Gs and thus are representable by weak 
deterministic automata. This proof relics on separating the integer and fractional 
parts of variables and on topological properties of F a C)Gs- It has the advantage of 
being much more direct than a proof relying on a quantifier elimination result. 

The problematic part of the operations on automata used for deciding a first-order 
theory is the sequence of projections and complementations needed to eliminate a 
string of quantifiers alternating between existential and universal ones. The second 
result of this paper shows that for sets defined in linear arithmetic this can be done 
with constructions that are simple adaptations of the ones used for automata on 
finite words. Indeed, deterministic weak automata can be viewed as either Biichi or 
co-Biichi automata. The interesting fact is that co-Buchi automata can be deter- 
minized by the "breakpoint" construction [Miyano and Hayashi 1984; Kupferman 
and Vardi 1997], which basically amounts to a product of subset constructions. 



^^Note that one cannot expect reals to be easier to handle with automata than integers since, 
by nature, this representation includes explicit information about the existence of integer values 
satisfying the represented formula. 
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Thus, one has a simple construction to project and determinize a weak automaton, 
yielding a deterministic co-Buchi automaton, which is easily complemented into a 
deterministic Biichi automaton. In the general case, another round of projection 
will lead to a nondctcrministic Biichi automaton, for which a general determiniza- 
tion procedure has to be used. However, we have the result that for automata 
obtained from linear arithmetic formulas, the represented sets stay within those 
accepted by deterministic weak automata. We prove that this implies that the 
automata obtained after determinization will always be weak. 

Note that this cannot be directly concluded from the fact that the represented 
sets stay within those representable by deterministic weak automata. Indeed, even 
though the represented sets can be accepted by deterministic weak automata, the 
automata that are obtained by the determinization procedure might not have this 
form. Fortunately, we can prove that this is impossible. For this, we go back to the 
link between automata and the topology of the sets of infinite words they accept. 
The argument is that w-regular sets in F a n G$ have a topological property that 
forces the automata accepting them to be inherently weak, i.e. not to have strongly 
connected components containing both accepting and non accepting cycles. 

Finally, an important additional benefit of working with weak deterministic au- 
tomata is that they admit a canonical minimal normal form that can be obtained 
efficiently [Malcr and Staiger 1997; Loding 2001]. This brings us even closer to the 
situation of working with finite-work automata, and is a property that is not avail- 
able when working either with general infinite-word automata, or with formulas as 
done in [Weispfenning 1999]. 

As a consequence of our results, we obtain a much simplified decision procedure 
for the theory combining integer and real linear arithmetic. The fact that this theory 
is decidable using automata-based methods was known [Boigelot et al. 1997], but 
the results of this paper make it possible to implement a tool that can handle it 
effectively. 

2. AUTOMATA-THEORETIC AND TOPOLOGICAL BACKGROUND 

In this section we recall some automata-theoretic and topological concepts that are 
used in the paper. 

2.1 Automata on Infinite Words 

An infinite word (or w-word) w over an alphabet E is a mapping w : N t— > E from 
the natural numbers to E. A Biichi automaton on infinite words is a five-tuple 
A = (Q,i:,S,qo,F), where 

— Q is a finite set of states; 
— E is the input alphabet; 

— S is the transition function and is of the form S : Q x E i— > 2^ if the automaton is 
nondeterministic and of the form d : Q x E i— > Q if the automaton is deterministic; 
— go is the initial state; 
— F is a set of accepting states. 

A run 7r of a Biichi automaton A = (Q, E, S, qo, F) on an w-word w is a mapping 
7r : N t— ► Q that satisfies the following conditions : 
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— 7r(0) = qo, i.e. the run starts in the initial state; 

— for all i > 0, ir(i + 1) G 5(ir(i), w(i)) (nondctcrministic automata) or ir(i + 
1) = <5(7r(z), w(i)) (deterministic automata), i.e. the run respects the transition 
function. 

Let inf(n) be the set of states that occur infinitely often in a run tt. A run ir 
is said to be accepting if inf(n) flF / 8. An w-word w is accepted by a Biichi 
automaton if that automaton has some accepting run on w. The language L W (A) 
of infinite words defined by a Biichi automaton A is the set of w-words it accepts. 
The lo— regular languages are defined as the languages of infinite words that can be 
accepted by a nondeterministic Biichi automaton. 

A co-Biichi automaton is defined exactly as a Biichi automaton except that its 
accepting runs are those for which m/(7r) n F = 0. 

We will also use the notion of weak automata [Mullcr ct al. 1986]. For a Biichi 
automaton A = (Q, S, 5, qo, F) to be weak, there has to be a partition of its state 
set Q into disjoint subsets Q\, . . . , Q m such that 

— for each of the Qi either Qi C F or Qi n F = 0, and 

— there is a partial order < on the sets Qi, . . . , Q m such that for every q G Qi and 
q' G Qj for which, for some a £ E, q' 6 S(q,a) (q' = S(q,a) in the deterministic 
case), Q 3 < Qi. 

For more details, a survey of automata on infinite words can be found in [Thomas 
1990]. 

2.2 Topology 

Given a set 5", a distance d(x, y) defined on this set induces a metric topology on 
subsets of S. A neighborhood N e (x) of a point x G S with respect to e G R + is the 
set N e (x) = {y \ d(x, y) < e}. A set C C S is said to be open if for all x G C, there 
exists e > such that the neighborhood N e (x) is contained in C. A closed set is a 
set whose complement with respect to S is open. We will be referring to the first 
few levels of the Borel hierarchy which are shown in Figure 1. The notations used 
are the following : 

— F are the closed sets, 
— G are the open sets, 

— F„ is the class of countable unions of closed sets, 

— Gs is the class of countable intersections of open sets, 

— F a s is the class of countable intersections of F a sets, 

— Gso- is the class of countable unions of Gs sets, 

— B(X) represents the finite Boolean combinations of sets in X. 

An arrow between classes indicates proper inclusion. 

3. TOPOLOGICAL CHARACTERIZATION OF ARITHMETIC SETS 

We consider the theory (R, Z, +, <), where + represents the predicate x + y = z. 
Since any linear equality or order constraint can be encoded into this theory, we 
refer to it as additive or linear arithmetic over the reals and integers. It is the 
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Fig. 1. The first few levels of the Borcl hierarchy in a metric topology. 



extension of Presburgcr arithmetic that includes both real and integer variables. 
We provide the space R" (n > 0) with the classical Euclidean distance between 
vectors defined by 

d(x,y) = ^2(xi - yi) 2 

\i=l 

The topology induced by this metric will be referred to as the natural topology of 

In this section, we prove that the sets reprcsentable in the additive linear arith- 
metic over the reals and integers belong to the topological class F a f)Gs- This result 
is formalized by the following theorem. 
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Theorem 3.1. Let S C R™, with n > 0, be a set defined in the theory (R, Z, +, 
<}. This set belongs to the class F a (~l G$ of the natural topology ofW 1 . 

Proof. Since (R, Z, + ,<) is closed under negation, it is actually sufficient to 
show that each formula of this theory defines a set that belongs to F a , i.e., a set 
that can be expressed as a countable union of closed sets. 

Let ip be a formula of (R,Z, +,<}. To simplify our argument, we will assume 
that all free variables of ip are reals. This can be done without loss of generality 
since quantified variables can range over both R and Z. We introduce u < v as a 
shorthand for u < v A ->(u = v). 

The first step of our proof consists of modifying <p in the following way. We 
replace each variable x that appears in ip by two variables xi and xf representing 
respectively the integer and the fractional part of x. Formally, this operation re- 
places each occurrence in ip of a free variable x by the sum x\ + xp while adding 
to ip the constraints < xp and Xf < 1, and transforms the quantified variables of 
ip according to the following rules : 

(3xeR)(f> — > (3x/ G Z)(3x F G R)(0 < x F Ax F < 1 A <j>[x/xi +x F }) 
(Vx G R)(f> — > (V.t/ e Z)(Vo;f G R)(x F <0Vl<x F V (j>[x/ Xl + x F }) 
(Qx e Z)4> — > (Qxi e 1)4>[x/xi], 

where Q G {3,V}, 4> is a subformula, and <j)[x/y\ denotes the result of replacing 
by y each occurrence of x in <p. The transformation has no influence on the set 
represented by <p, except that the integer and fractional parts of each value are now 
represented by two distinct variables. 

Now, the atomic formulas of ip are of the form p = q+r, p = q or p < q, where p, q 
and r are either integer variables, sums of an integer and of a fractional variable, or 
integer constants. The second step consists of expanding these atomic formulas so 
as to send into distinct atoms the occurrences of the integer and of the fractional 
variables. This is easily done with the help of simple arithmetic rules, for the truth 
value of the atomic formulas that involve both types of variables has only to be 
preserved for values of the fractional variables that belong to the interval [0,1). 
The set of expansion rules 2 (up to commutability of members and terms) is given 
in Figure 2. 

After the transformation, each atomic formula of ip is cither a formula <pj in- 
volving only integer variables or a formula 4>p over fractional variables. We now 
distribute existential (resp. universal) quantifiers over disjunctions (resp. conjunc- 
tions), after rewriting their argument into disjunctive (resp. conjunctive) normal 
form, and then apply the simplification rules 

(Qxi G Z)(0j a <fip) — > (Qxi G Z)((/)/) a <j)p 
(Qx F £ a(f) F ) — ► 4>i a (Qxp G R)(0_p), 

where Q G {3, V} and a G {V, A}. 

Repeating this operation, we eventually get a formula ip' equivalent to ip that 



2 In these rules, the expression p = q + r + s is introduced as a shorthand for (3u £ M)(u = 
q + r A p = u + s). 
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xi = (yi + y F ) 
(xi + x F ) = (yi + y F ) 
xi = yi + (zi + zf) 
xi = {yi + y F ) + (zi + z F ) 

(xi + if) =yi + zi 
(xi + x F ) =yi + [zi + zf) 
(xi + x F ) = (yi + y F ) + (zi + z F ) 

xi < (yi + y F ) 
(xi + x F ) < yi 
(xi + i F ) < {yi + zi) 



xi = yi Ay F = 

xi = yi A x F = y F 

xi = yi + zi A z F = 

(xi = yi + zi Ay F + z F — 0) V 

{xi = yi + zi + 1 Ay F + z F = 1) 

xi = yi + zi A x F = 

xi = yi + zi A x F = z F 

(xi = yi + zi A x F = y F + Zf) V 

(xi = yi + Zi + 1 A xf = Vf + Zf — 1) 

xi < yi 

xi < yi V (xi = yi A x F = 0) 
xi < yi V (xi = yi A x F < y F ) 



Fig. 2. Expansion rules. 

takes the form of a finite Boolean combination 

(i) (t) 

of subformulas <fij and 0]^ that involve respectively only integer and fractional 
variables. 

Let Xj, xf\ . . . , be the free integer variables of ip' (k < n). For each 

(i) 

assignment of values to these variables, the subformulas 0} are each identically 
true or false, hence we have 

<p = V ((4 1) .---.4* ) ) = (°i»- ••>«*) A v,-."*)^'----^ ))- 

(ai,...,ai ! )eZ fc 

Each subformula 0^ belongs to the theory (M, +, <, 1), which admits the elimina- 
tion of quantifiers [Ferrante and Rackoff 1979]. The sets of real vectors satisfying 
these formulas are thus finite Boolean combinations of linear constraints with open 
or closed boundaries. It follows that, for each (oi, . . . , flfe) € Z fc , the set described 
by B( ai ,...,a,k) i s a finite Boolean combination of open and closed sets, that is a set 
belonging to the topological class B(F) = B{G). Since, according to properties of 
the Borel hierarchy, this class forms a subset of F a , the set described by cp is a 
countable union of countable unions of closed sets and also lies within F a . □ 

4. REPRESENTING SETS OF INTEGERS AND REALS WITH FINITE AUTOMATA 

In this section, we recall the finite-state representation of sets of real vectors as 
introduced in [Boigelot et al. 1997]. 

In order to make a finite automaton recognize numbers, one needs to establish 
a mapping between these and words. Our encoding scheme corresponds to the 
usual notation for reals and relies on an arbitrary integer base r > 1. We encode 
a number x in base r, most significant digit first, by words of the form wi *wf, 
where wi encodes the integer part xi of a; as a finite word over {0, . . . , r — 1}, the 
special symbol is a separator, and wp encodes the fractional part xp of x as 
an infinite word over {0, . . . , r — 1}. Negative numbers are represented by their r's 
complement. The length p of \wi\, which we refer to as the integer-part length of 
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w, is not fixed but must be large enough for — r p_1 < xj < r p_1 to hold. 

According to this scheme, each number has an infinite number of encodings, since 
their integer-part length can be increased unboundedly. In addition, the rational 
numbers whose denominator has only prime factors that are also factors of r have 
two distinct encodings with the same integer-part length. For example, in base 
10, the number 11/2 has the encodings 005 *5(0) w and 005*4(9)", denoting 
infinite repetition. 

To encode a vector of real numbers, we represent each of its components by words 
of identical integer-part length. This length can be chosen arbitrarily, provided that 
it is sufficient for encoding the vector component with the highest magnitude. An 
encoding of a vector x € K™ can indifferently be viewed either as a n-tuple of words 
of identical integer-part length over the alphabet {0, . . . ,r — 1,*}, or as a single 
word w over the alphabet {0, . . . , r — 1}™ U {*}. 

Since a real vector has an infinite number of possible encodings, we have to 
choose which of these the automata will recognize. A natural choice is to accept all 
encodings. This leads to the following definition. 

Definition 4.1. Let n > and r > 1 be integers. A Real Vector Automaton 
(RVA) A in base r for vectors in M™ is a Biichi automaton over the alphabet 
{0, . . . , r - 1}™ U {*}, such that 

— every word accepted by A is an encoding in base r of a vector in R™, and 
— for every vector x £ R n , A accepts either all the encodings of x in base r, or none 
of them. 

An RVA is said to represent the set of vectors encoded by the words that belong 
to its accepted language. 

Efficient algorithms have been developed for constructing RVA representing the 
sets of solutions of systems of linear equations and inequations [Boigelot et al. 1998]. 
Boolean operations can easily be achieved on RVA by applying the corresponding 
existing algorithms for infinite-word automata. 

Furthermore, a set represented as an RVA can be quantified existcntially with 
respect its i— th vector component over the real domain, by replacing each symbol in 
{0, . . . , r — 1}™ read by the automaton with the same symbol out of which the i— th 
component has been removed. This produces a nondcterministic automaton that 
may only accept some encodings of each vector in the quantified set, but generally 
not all of them. Such a situation can arise if the component of highest magnitude 
for some vectors in the set is projected out 3 . The second step consists thus of 
modifying the automaton so as to make it accept every encoding of each vector 
that it recognizes. Algorithms have been developed for this purpose in the case of 
finite-word automata [Boigelot 1998; Boigelot and Latour 2001]. These algorithms 
also apply to RVA, since the behavior of the underlying Biichi automaton before 
reading the separator "★" is identical to that of a finite-word automaton recognizing 
the integer part of the vectors in the represented set. 

Finally, since it is immediate to constrain a number to be an integer with an RVA 
by imposing its fractional part to be either 0" or (r — l) w (i.e. by intersecting its 

3 For instance, projecting out the first component of the set {(8, 1)} in binary would produce an 
automaton that does not accept encodings of 1 having less than five bits in their integer part. 
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accepted language with {0, r - 1}" • ({0, . . . , r - 1}")* • {*} • {0, r - 1}"), it follows 
that one can construct an RVA for any formula of the arithmetic theory we are 
considering. 

5. WEAK AUTOMATA AND THEIR PROPERTIES 

If one examines the constructions given in [Boigelot et al. 1998] to build RVA for 
linear equations and inequations, one notices that they have the property that all 
states within the same strongly connected component are either accepting or non 
accepting. This implies that these automata are weak in the sense of [Muller et al. 
1986] (see Section 2.1). 

5.1 Determinizing Weak Automata 

Weak automata have a number of interesting properties. A first one is that they 
can be represented both as Biichi and co-Biichi. Indeed, a weak automaton A = 
(Q, E, 6, qo, F) is equivalent to the co-Biichi automaton A = (Q, E, S, qo, Q \ F), 
since a run eventually remains within a single component Qi in which all states 
have the same status with respect to being accepting. A consequence of this is that 
weak automata can be determinizcd by the fairly simple "breakpoint" construc- 
tion [Kupferman and Vardi 1997; Miyano and Hayashi 1984] that can be used for 
co-Biichi automata. This construction is the following. 

Let A = (Q, E, <5, qo, F) be a nondctcrministic co-Biichi automaton. The deter- 
ministic co-Biichi automaton A' = (Q',Y,,6',q ,F') defined as follows accepts the 
same w-language : 

— Q' = 2® x 2®, i.e. the states of A' are pairs of sets of states of A. 
-q' = ({q },$). 

— For (S, R) G Q' and a G E, the transition function is defined by 

—if R = 0, then S((S, R),a)= (T, T\F) where T = {q | (Bp G S) q G S(p, a)} : 
T is obtained from S as in the classical subset construction, and the second 
component of the pair of sets of states is obtained from T by eliminating states 
in F; 

—if R ^ 0, then S((S, R), a) = (T,U\ F) where T = {q | (3p G S) q G S(p, a)}, 
and U — {q | (3p G R) q G 5(p, a)} : the subset construction set is now applied 
to both S and R and the states in F are removed from U. 
—F' = 2« x {0}. 

When the automaton A' is in a state (S,R), R represents the states of A that 
can be reached by a run that has not gone through a state in F since the last 
"breakpoint", i.e. state of the form (5, 0). So, for a given word, A has a run that 
does not go infinitely often through a state in F if and only if A' has a run that 
does not go infinitely often through a state in F' . Notice that the difficulty that 
exists for determinizing Biichi automata, which is to make sure that the same run 
repeatedly reaches an accepting state, disappears since, for co-Biichi automata, we 
are just looking for a run that eventually avoids accepting states. 

It is interesting to notice that the construction implies that all reachable states 
(S, R) of A' satisfy R C S. The breakpoint construction can thus be implemented as 
a subset construction in which the states in R are simply tagged, which implies that 
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the worst-case complexity of the construction is 2°( n \ This makes the construction 
behave in practice very similarly to the traditional subset construction for finite- 
word automata. 

5.2 Topological Characterization 

Another property of weak automata that will be of particular interest to us is the 
topological characterization of the sets of words that they can accept. We consider 
the topology on the sets of infinite words over an alphabet £ induced by the distance 
on the <jj— words 



where | common (w, w')\ denotes the length of the longest common prefix of w and 
w' . The open sets in such a topological space are the sets of the form X ■ S u , 
where X C £+ is a language of finite words. Relations between this topology and 
automata are well understood. For instance, it has been proved that the languages 
of infinite words that can be accepted by a deterministic Biichi automaton are 
exactly the 10— rational languages belonging to the class G$ [Landweber 1969]. By 
duality, deterministic co-Buchi automata accept exactly the oj-rcgular languages 
that belong to F a . 

As weak deterministic automata can be seen both as deterministic Biichi and 
deterministic co-Buchi, they accept exactly the w-regular languages that are in F a n 
G$. This follows from the results on the Staiger- Wagner class of automata [Staiger 
and Wagner 1974; Staiger 1983], which coincides with the class of deterministic 
weak automata, as can be inferred from [Staiger and Wagner 1974] and is shown 
explicitly in [Maler and Staiger 1997]. 

5.3 Inherently Weak Automata 

Given the result proved in Section 3, it is tempting to conclude that the encodings 
of sets definable in the theory (R, Z, + , <) can always be accepted by weak deter- 
ministic automata. This conclusion is correct, but requires shifting the result from 
the topology on numbers to the topology on words, which we will do in the next 
section. In the meantime, we need one more result in order to be able to benefit 
algorithmically from the fact that we are dealing with F a n G$ sets, i.e. that any 
deterministic automaton accepting a F a n Gs set is essentially a weak automaton. 
Consider the following definition. 

Definition 5.1. A Biichi automaton is inherently weak if none of the reachable 
strongly connected components of its transition graph contains both accepting (in- 
cluding at least one accepting state) and non accepting (not including any accepting 
state) cycles. 

Clearly, if an automaton is inherently weak, it can directly be transformed into 
a weak automaton : the partition of the state set is its partition into strongly 
connected components and all the states of a component are made accepting or 
not, depending on whether the cycles in that component are accepting or not. 
We will now prove the following. 
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Theorem 5.2. Any deterministic Biichi automaton that accepts a language in 
F a (~l Gg is inherently weak. 

To prove this, we use the fact that the language accepted by an automaton that 
is not inherently weak must have the following property. 

Definition 5.3. A language L C has the dense oscillating sequence prop- 
erty if, wi, u>2, u>3, . . . being words and s\, £2, £3, ■ ■ • being distances, one has that 
3w\ie\3w2 Ve2 • ■ ■ such that d{wi 1 Wi+i) < £i for all i > 1, Wi £ L for all odd i, and 
Wi L for all even i. 

Showing that this infinitesimal oscillation is incompatible with the structure of weak 
deterministic automata will allow us to conclude. The proof of Theorem 5.2 can 
thus be split into the two following lemmas. 

Lemma 5.4. Each lj— language accepted by an Biichi automaton that is not in- 
herently weak has the dense oscillating sequence property. 

Proof. Consider a reachable strongly component that contains both an accept- 
ing and a non accepting cycle, and call p a finite word that allows to reach the first 
state of the accepting cycle from the initial state of the automaton. Let ca (resp. 
cjy) be the finite word that labels the accepting (resp. non accepting) cycle, and 
tA (resp. tpf) a finite word that labels the path from the first state of the accepting 
(resp. non accepting) cycle to the first state of the non accepting (resp. accepting) 
cycle. 

Given an infinite sequence of distances ei,E2,£3, ■ ■ ., we are now ready to con- 
struct a dense oscillating sequence for the language L accepted by the automaton. 
If k2 1 k^, 1 /c4, ... is a sequence of natural numbers, define u\ = p, and for all i > 1 : 



Given i > 1, it is always possible to find an integer fc^+i large enough for 
d{wi 1 Wi+\) < £t to hold. Indeed, the length of the common prefix between Wi 
and Wi+i increases with fej+i. Furthermore, Wi loops either in an accepting cycle 
if i is odd, or in a non accepting cycle if i is even, hence, Wi G L if and only if i is 
odd. Thus, the sequence of Wi's is dense oscillating for the language accepted by 
the automaton. □ 

Lemma 5.5. An w-regular language that has the dense oscillating sequence prop- 
erty cannot be accepted by a weak deterministic automaton and hence is not in 



Proof. We proceed by contradiction. Assume that a language L having the 
dense oscillating sequence property is accepted by a weak deterministic automaton 
A. Consider the first word w\ in a dense oscillating sequence for L. This word 
eventually reaches an accepting component Qi 1 of the partition of the state set 

ACM Transactions on Computational Logic, Vol. V, No. N, Month 20YY. 




Wi [i > 1) is then defined as follows : 




if i is odd 
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of A and will stay within this component. Since s\ can be chosen freely, it can 
be taken small enough for the run of A on W2 to also reach the component Qi t 
before it starts to differ from W\. Since W2 is not in L, the run of A on W2 has to 
eventually leave the component Qi t and will eventually reach and stay within a non 
accepting component Qi 2 < Q^. Repeating a similar argument, one can conclude 
that the run of A on W3 eventually reaches and stays within an accepting component 
Qi 3 < Qi 2 - Carrying on with this line of reasoning, one concludes that the state set 
of A must contain an infinite decreasing sequence of distinct components, which is 
impossible given that it is finite. □ 

5.4 Minimizing Weak Deterministic Automata 

The breakpoint construction reduces much of the determinization of weak automata 
to that of finite-word automata. The similarity can be carried on. Indeed, like finite- 
word automata, weak deterministic automata admit a normal form unique up to 
an isomorphism [Maler and Staiger 1997]. 

This normal form can be obtained efficiently using an algorithm proposed in [L6- 
ding 2001]. The minimization algorithm consists in locating the strongly connected 
components of the graph of the automaton that do not contain any cycle, then 
attributing them a new accepting status, according to a rule involving strongly 
connected components that are deeper in the graph. This operation does not affect 
the language accepted by the automaton, since for any run tt of the automaton, 
7r cannot loop in such strongly connected components, leaving inf(n) unchanged. 
Hopcroft's classical algorithm for minimizing finite- word automata [Hopcroft 1971] 
can then be applied directly to the modified weak deterministic automaton in order 
to get an equivalent minimal weak deterministic automaton. 

When suitably implemented, this algorithm can be run in time 0(n log n) , moving 
us still closer to the case of automata on finite words. 

6. DECIDING LINEAR ARITHMETIC WITH REAL AND INTEGER VARIABLES 

Let us show that the result of Section 3 also applies to the sets of words that 
encode sets defined in (R, Z, +,<). In order to do so, we need to establish that 
the topological class F a n Gs defined over sets of reals is mapped to its w-word 
counterpart by the encoding relation described in Section 4. 

Theorem 6.1. Let n > and r > 1 be integers, and let L(S) C ({0, ...,r — 
1}™ U be the set of all the encodings in base r of the vectors belonging to the 

set S C M. n . If the set S belongs to F a n Gs (with respect to Euclidean distance), 
then the language L(S) belongs to F a n Gs (with respect to co-word distance). 

Proof. Not all infinite words over the alphabet £ = {0, . . . , r— 1}™ U {★} encode 
a real vector. Actually, every arbitrary small neighborhood of a word encoding 
validly a vector of M. n contains words that are not valid encodings, namely the 
ones containing multiple occurrences of the separator "*" that are far enough in 
the word. Let V be the set of all the valid encodings of vectors in base r. Its 
complement V can be partitioned into a set Vo containing only words in which the 
separator "★" does not appear, and a set V + containing words in which occurs 
at least once (including the words that are not valid encodings because of an illegal 
sign digit). Formally, we have 
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-V = {0,r-1}".(E\ {*})*•{*}• (S\ {*})", 
-Vo = (E \ {*})«, 

-V+ = (({0, r - 1}" • E* • {*}) U (S \ {0, r - 1}")) • E* • {*} • 57". 

By definition, 1/, T^o an d V+ are disjoint, and we have V = Vq U V+. The set 
has the form X ■ E w with X C E + , hence it is open. 

Similarly, the set V + U V is open since it can be expressed as the union of the 
set V+, which has just been proved open, and of the set of words beginning by a 
valid leading symbol and containing at least one separator, i.e., with the language 
{0, r — 1}™ • E* • {*} • E w . The latter set is open for the same reason as V+. 

Let now consider an open set S C R™. Each word w £ L(S) has a neighborhood 
entirely composed of words in L(S) (formed by the encodings of vectors that belong 
to a neighborhood of the vector encoded by w) and of words that contain at least 
two separators, which belong to V+. Moreover, since V+ is open, each word w G V+ 
admits a neighborhood fully composed of words in V+. Thus, every word in the 
language L' = L(S) U V + has a neighborhood included in L' , implying that L' is 
open. Since L(S) = V \ V+, we have that L(S) is the intersection of an open and 
a closed set. 

The same result holds for a closed set S C R". Indeed, following the same line 
of reasoning as above, V = L(R™ \ S) U V + is open because the complement of 
Sjs itself open. On the other hand, we have L(R^ \ S)_= L(S) n V. Therefore, 
V = L(S) U V holds, hence L(S) = V \ V = V n (F+ U V). The last relation 
entails that L(S) is the intersection of a closed and an open set. 

We are now ready to address the case of a set S C R" that belongs to F a nGj. 
Since S is in F a , it can be expressed as a countable union of closed sets Si, S2, ■ ■ ■ ■ 
It has been showed that the languages L(Si), £(£2), ■ ■ ■ are Boolean combinations 
of open and of closed sets, and thus belong to the topological class F a . Therefore, 
L(S) = L(Si) U L{S2) U • • • is a countable union of sets in F a , and belongs itself 
to F a . Now, since S is in Gs, it can also be expressed as a countable intersection 
of open sets S[, S' 2 , . . . . The languages L(S[) , L(S' 2 ) , . . . belong to the topological 
class Gs- Hence, L(S) = L(S[) D L{S' 2 ) fl • • • is a countable intersection of sets in 
Gs, and thus belongs itself to Gs- Therefore, we have L(S) E F a n Gs- □ 

Knowing that the language of the encodings of any set definable in the theory 
(R, Z, +, <) belongs to F a P\Gs, we use the results of Section 5 to conclude the 
following. 

Theorem 6.2. Every deterministic RVA representing a set definable in (R, Z, +, 
<} is inherently weak. 

This property has the important consequence that the construction and the ma- 
nipulation of RVA obtained from arithmetic formulas can be performed effectively 
by algorithms operating on weak deterministic automata. Precisely, to obtain an 
RVA for an arithmetic formula one can proceed as follows. 

For equations and inequations, one uses the constructions given in [Boigelot et al. 
1998] to build weak RVA. Computing the intersection, union, and Cartesian product 
of sets represented by RVA simply reduces to performing similar operations with 
the languages accepted by the underlying automata, which can be done by simple 
product constructions. These operations preserve the weak nature of the automata. 
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Periodic tiling with triangles. 



To complement a weak RVA, one determinizes it using the breakpoint construction, 
which is guaranteed to yield an inherently weak automaton (Theorem 6.2) that is 
easily converted to a weak one. This deterministic weak RVA is then complemented 
by inverting the accepting or non-accepting status of each of its components, and 
then removing from its accepted language the words that do not encode validly a 
vector (which is done by means of an intersection operation) . 

An existential quantifier can be applied to a set represented as an RVA by using 
the construction detailed in Section 4. This operation does not affect the weak 
nature of the automaton, which can then be dctcrminizcd by the breakpoint con- 
struction. The determinization algorithm has to produce an inherently weak RVA 
easily converted to a weak automaton. 

Thus, in order to decide whether a formula of (R, Z, +, <} is satisfiablc, one 
simply builds an RVA representing its set of solutions, and then check whether this 
automaton accepts a nonempty language. This also makes it possible to check the 
inclusion or the equivalence of sets represented by RVA. The main result of this 
paper is that, at every point of the interpretation of a formula, the constructed au- 
tomaton remains weak and thus only the simple breakpoint construction is needed 
as a determinization procedure. 

Finally, as weak deterministic automata can be efficiently minimized, each con- 
structed automaton can be reduced down to a normal form. This is particularly 
useful from a practical point of view, since it speeds up the comparisons between 
sets by reducing them to structural tests on the automata, and since it prevents 
the representations from becoming unnecessarily large. 

7. EXPERIMENTS 

The decision procedure proposed in this paper has been implemented successfully 
in the LASH toolset, a package based on finite-state automata for representing 
infinite sets and exploring infinite state spaces [LASH ]. 
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Fig. 4. Weak RVA representing the periodic tiling in binary. 

Various experiments have been achieved with the RVA package. For instance, it 
is possible to represent the set of Figure 3, which combines discrete and continuous 
features, by a weak RVA. Indeed, this set is defined by the following formula of the 
additive theory over the reals and integers : 

{(x u x 2 ) G K 2 | {3x 3 , x 4 G M)(3x 5 , x 6 G Z) 

(x\ = x$ + 2^5 A X2 — x\ + 2xq A X3 > A xA < 1 A xa > £3)}. 

This set admits the compact minimal representation of Figure 4. 

One might fear that the exponential worst-case complexity of the breakpoint 
determinization algorithm makes our decision procedure unusable. Experimental 
results however show that such a blow-up does not frequently occur in practical 
applications. As an illustration, Figure 5 shows the cost of projecting and then de- 
terminizing the finite-state representations of some periodic subsets of R 3 obtained 
by combining linear constraints with arbitrary coefficients, and then by inducing 
a periodicity by means of an integer quantification. The interesting observation 
is that the finite-state representations have always less states after the projection 
than before, whereas an exponential blow-up could have been feared. 

Another finite-state representation system, the NDD (Number Decision Dia- 
gram) [Wolper and Boigelot 1995; Boigelot 1998], is based on finite- word automata 
and is able to represent the subsets of 1 n that can be expressed in an extension 
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of the first-order theory (Z, +, <}. Figure 6 compares the size of weak RVA with 
that of NDD representing the same subsets of Z 3 obtained by combining linear con- 
straints with arbitrary coefficients. One notices that the behavior of RVA is very 
similar to that of NDD, that are reputed to behave quite well in practice [Wolper 
and Boigelot 2000]. 

These observations make one think that the pathological conditions that lead the 
breakpoint construction to blow-up are seldom met in practice. 

8. CONCLUSIONS 

A probably unusual aspect of this paper is that it does not introduce new algo- 
rithms, but rather shows that existing algorithms can be used in a situation where 
a priori they could not be expected to operate correctly. To put it in other words, 
the contribution is not the algorithm but the proof of its correctness. 

The critical reader might be wondering if all this is really necessary. After all, 
algorithms for complementing Biichi automata exist, either through determiniza- 
tion [Safra 1988] or directly [Biichi 1962; Sistla et al. 1987; Kupferman and Vardi 
1997; Klarhmd 1991] and the more recent of these are even fairly simple and poten- 
tially implemcntable. There are no perfectly objective grounds on which to evaluate 
"simplicity" and "ease of implementation" , but it is not difficult to convince oneself 
that the breakpoint construction for determinizing weak automata is simpler than 
anything proposed for determinizing or complementing Biichi automata. Indeed, 
it is but one step of the probably simplest complementation procedure proposed 
so far, that of [Kupferman and Vardi 1997]. Furthermore, there is a complexity 
improvement from 2 (™ logn ) to 2°( n \ and being able to work with deterministic 
weak automata allows minimization [Loding 2001], which leads to a normal form. 
Those claims to simplicity and ease of implementation are substantiated by the 
experimental results. 

Our implementation makes it possible to represent possibly non convex periodic 
sets containing both integers and reals, and to manipulate those sets using Boolean 
operations and quantification, and to check relations existing between them. To the 
best of our knowledge, doing this is beyond the scope of any other implemented tool. 
The potential application field of RVA is wide and range from symbolic analysis 
of linear hybrid systems [Alur et al. 1995] to temporal databases [Chomicki and 
Imielihski 1988; Kabanza et al. 1990]. 
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